Goto

Collaborating Authors

 turn-based game


A Game of Pawns

arXiv.org Artificial Intelligence

We introduce and study pawn games, a class of two-player zero-sum turn-based graph games. A turn-based graph game proceeds by placing a token on an initial vertex, and whoever controls the vertex on which the token is located, chooses its next location. This leads to a path in the graph, which determines the winner. Traditionally, the control of vertices is predetermined and fixed. The novelty of pawn games is that control of vertices changes dynamically throughout the game as follows. Each vertex of a pawn game is owned by a pawn. In each turn, the pawns are partitioned between the two players, and the player who controls the pawn that owns the vertex on which the token is located, chooses the next location of the token. Control of pawns changes dynamically throughout the game according to a fixed mechanism. Specifically, we define several grabbing-based mechanisms in which control of at most one pawn transfers at the end of each turn. We study the complexity of solving pawn games, where we focus on reachability objectives and parameterize the problem by the mechanism that is being used and by restrictions on pawn ownership of vertices. On the positive side, even though pawn games are exponentially-succinct turn-based games, we identify several natural classes that can be solved in PTIME. On the negative side, we identify several EXPTIME-complete classes, where our hardness proofs are based on a new class of games called Lock & Key games, which may be of independent interest.


Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

arXiv.org Artificial Intelligence

Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum. The majority of existing results in this field focuses on either symmetric solution concepts (e.g. Nash equilibrium) or zero-sum games. It remains vastly open how to learn the Stackelberg equilibrium -- an asymmetric analog of the Nash equilibrium -- in general-sum games efficiently from samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games. We identify a fundamental gap between the exact value of the Stackelberg equilibrium and its estimated version using finite samples, which can not be closed information-theoretically regardless of the algorithm. We then establish a positive result on sample-efficient learning of Stackelberg equilibrium with value optimal up to the gap identified above. We show that our sample complexity is tight with matching upper and lower bounds. Finally, we extend our learning results to the setting where the follower plays in a Markov Decision Process (MDP), and the setting where the leader and the follower act simultaneously.


Provable Self-Play Algorithms for Competitive Reinforcement Learning

arXiv.org Artificial Intelligence

This paper studies competitive reinforcement learning (competitive RL), that is, reinforcement learning with two or more agents taking actions simultaneously, but each maximizing their own reward. Competitive RL is a major branch of the more general setting of multi-agent reinforcement learning (MARL), with the specification that the agents have conflicting rewards (so that they essentially compete with each other) yet can be trained in a centralized fashion (i.e. each agent has access to the other agents' policies) (Crandall and Goodrich, 2005). There are substantial recent progresses in competitive RL, in particular in solving hard multi-player games such as GO (Silver et al., 2017), Starcraft (Vinyals et al., 2019), and Dota 2 (OpenAI, 2018). A key highlight in their approaches is the successful use of self-play for achieving superhuman performance in absence of human knowledge or expert opponents. These self-play algorithms are able to learn a good policy for all players from scratch through repeatedly playing the current policies against each other and performing policy updates using these self-played game trajectories. The empirical success of self-play has challenged the conventional wisdom that expert opponents are necessary for achieving good performance, and calls for a better theoretical understanding. In this paper, we take initial steps towards understanding the effectiveness of self-play algorithms in competitive RL from a theoretical perspective.


Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

arXiv.org Machine Learning

We develop provably efficient reinforcement learning algorithms for two-player zero-sum Markov games in which the two players simultaneously take actions. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and the goal is to find the Nash Equilibrium efficiently by minimizing the worst-case duality gap. In the online setting, we control a single player to play against an arbitrary opponent and the goal is to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an $\tilde O(\sqrt{d^3 H^3 T})$ upper bound on the duality gap and regret, without requiring additional assumptions on the sampling model. We highlight that our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism in simultaneous-move Marko games, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash Equilibrium of such a general-sum game is computationally hard, our algorithm instead solves for a Coarse Correlated Equilibrium (CCE), which can be obtained efficiently via linear programming. To our best knowledge, such a CCE-based scheme for implementing optimism has not appeared in the literature and might be of interest in its own right.


Hands-on: Age of Wonders: Planetfall puts a refreshing sci-fi spin on 4X strategy

PCWorld

Age of Wonders needed a hook, and I think Triumph finally found it. Writing about Age of Wonders III back in 2014, one of my biggest complaints was that it felt somewhat unnecessary. The fantasy 4X genre is nothing if not crowded these days, and despite Age of Wonders lifting as much influence from Heroes of Might and Magic as it does from Civilization, it still didn't offer much reason to persist through its cumbersome systems and slog of a story setup. That genre's considerably more open, especially when it comes to land-based science fiction. In recent years, that category consists of Civilization: Beyond Earth and...not much else.